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The nucleotide sequence of the human coronavirus 229E (HCV 229E) RNA polymerase gene and the 5' region of the 
genome has been determined. The polymerase gene is comprised of two large open reading frames, ORFla and 
ORFIb, that contain 4086 and 2687 codons, respectively. ORFIb overlaps ORFla by 43 bases in the (-1) reading 
frame. The in vitro translation of SP6 transcripts which include HCV 229E sequences encompassing the ORF1 a/ORFl b 
junction show that expression of ORFIb can be mediated by ribosomal frame-shifting. The predicted translation 
products of ORFla (454,200 molecular weight) and ORFIa/lb (754,200 molecular weight) have been compared to the 
predicted RNA polymerase gene products of infectious bronchitis virus (IBV) and murine hepatitis virus (MHV) and 
conserved structural features and putative functional domains have been identified. This analysis completes the nu¬ 
cleotide Sequence Of the HCV 229E genome. © 1993 Academic Press. Inc. 


INTRODUCTION 

The human coronaviruses (HCV) are the cause of 
upper respiratory illness and it has been estimated that 
up to 20% of common colds are caused by HCV (McIn¬ 
tosh et at., 1974; Isaacs et at., 1983; Macnaughton et 
at., 1983). Although the symptoms of HCV-related 
colds are generally mild and the duration of illness is 
short, the economic consequences of HCV infection 
are significant (Hierholzer and Tannock, 1988). Also, 
the possible association of HCV infection with more 
severe respiratory tract illness in children (Matsumoto 
and Kawano, 1992) or as a precipitant of asthmatic 
exacerbations (Pattemore et at., 1992) needs to be fur¬ 
ther investigated. 

It has been established that there are two major anti¬ 
genic groups of HCV, represented by the prototypes 
HCV 229E and HCV OC43. The major structural com¬ 
ponents of HCV 229E and HCV OC43 virions have 
been identified and there is some limited information 
on the synthesis of viral RNA and proteins in the in¬ 
fected cell (Schmidt and Kenny, 1982; Schmidt, 1984; 
Kemp et a!., 1984; Hogue and Brian, 1986; Schreiber 
et at., 1989; Raabe et at., 1990; Arpin and Talbot, 
1990). 

The HCV 229E genome is a positive-strand RNA 
with an estimated size of 6 X 10 s (Macnaughton and 
Madge, 1978). To date, the nucleotide sequence of 
approximately 7 kilobases extending from the 3'end of 
the genome has been determined. This region en¬ 
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codes the nucleocapsid protein, N (Schreiber et at., 
1989; Myint et at., 1990), the membrane glycoprotein, 
M (Raabe and Siddell, 1989b; Jouvenne et at., 1990) 
and the surface glycoprotein, S (Raabe et at., 1990). 
Additionally, there are three small open reading frames 
(ORFs), ORF4a, ORF4b, and ORF5, located between 
the S and the M protein genes (Raabe and Siddell, 
1989a; Jouvenne et at., 1992). It seems likely that the 
putative HCV 229E ORF5 gene product is a virion struc¬ 
tural protein (Liu and Inglis 1991; Godet et at., 1992) 
but the function of the putative ORF4a and ORF4b 
gene products is unknown. 

In coronavirus-infected cells, the viral genes are ex¬ 
pressed from the genomic and subgenomic mRNAs. 
The subgenomic mRNAs form a 3' coterminal set and 
are synthesized by a process of discontinuous tran¬ 
scription (for a recent review see Lai, 1990). In the case 
of HCV 229E, seven positive-strand RNA species (num¬ 
bered 1 to 7 in order of decreasing size) have been 
identified in the infected cell. The translation products 
of the S, ORF4a and 4b, ORF5, M, and N protein genes 
have been provisionally assigned to RNA 2, 4, 5, 6, and 
7, respectively (Raabe et at., 1990; Schreiber et a!., 
1989), although messenger RNA function has been 
confirmed only for RNA 7 (Myint et at., 1990). It is not 
yet clear whether RNA 3 should be considered as a 
putative mRNA (Raabe et at., 1990). 

The remainder of the HCV 229E genome encom¬ 
passes the unique region of RNA 1, i.e., the genomic 
RNA. This region, which is referred to as gene 1, the 
RNA polymerase gene or the RNA polymerase locus, 
has been entirely sequenced for IBV and MHV (Bours- 
nell et at., 1987; Bredenbeek et at., 1990; Lee ef a/., 
1991) and is comprised of two large ORFs, ORF1 a and 
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Fig. 1 . Organization of the HCV 229E genome and the position of cDNA clones encompassing gene 1. The major ORFs are represented as 
boxes in the 0, -1, and-2 reading frames. The known and putative structural genes {S, 5, M, and N) are shaded and gene 1 (Poll a and Pollb) is 
cross-hatched. The size and position of the cDNA clones and PC R amplification products used to determine the HCV 229E gene 1 sequence are 
shown. The relationship of the intracellular poly(A) RNA species to the genome is also illustrated. 


ORFlb, which overlap by 40-80 bases. The upstream 
ORFla potentially encodes a polypeptide of 450,000 
to 500,000 molecular weight. The downstream ORF1 b 
potentially encodes a polypeptide of 300,000 molecu¬ 
lar weight. The downstream ORFlb is expressed, how¬ 
ever, as a fusion protein together with the ORFla gene 
product by a mechanism involving (-1) ribosome slip¬ 
page (Brierley era/., 1987). This ribosomal frameshift is 
mediated by a “slippery sequence'’ and pseudoknot 
structure located in a region of the genome encom¬ 
passing the overlap of ORFla and ORFlb {Brierley et 
at., 1989, 1991, 1992; Bredenbeekef a/., 1990; Lee et 
ai, 1991). 

In the case of IBVand MHV, the ORFlb regions are 
relatively conserved whereas the ORFla regions have 
diverged, in particular toward their 5' ends. It is evident 
that these two large ORFs must encode a number of 
different functions. First, there are functions related to 
RNA replication. Complementation analysis of MHV ts 
mutants with a RNA minus phenotype has shown that 
there are at least five distinct viral functions related to 
RNA synthesis (Leibowitz et a!., 1982; Schaad et ai, 
1990). Analysis of these mutants by genetic recombi¬ 
nation allows the different functions to be located and 
ordered within the gene 1 locus (Keck et at., 1987; 
Baric et ai, 1990). Also, both IBV and MHV contain in 
their ORFlb sequence motifs characteristic of RNA 
polymerases, helicase and metal binding proteins 
(Gorbalenya et ai, 1989; Bredenbeek et at., 1990). 


Second, there is genetic and biochemical evidence 
that the MHV gene 1 contains viral encoded proteases. 
The complementation frequencies of MHV ts mutants 
are indicative of intergenic, rather than intragenic com¬ 
plementation (Leibowitz et ai, 1982) and an autopro- 
teolytic activity has been mapped to the middle of the 
MHV ORF1 a (Baker et ai, 1989). Motifs characteristic 
of both papain-like and picornavirus 3C-iike cysteine 
proteases have also been identified in ORFla of MHV 
and IBV (Gorbalenya eta/., 1989; Lee etai, 1991). 

Finally, the large size of the coronavirus gene 1 re¬ 
gion (approximately 20 kilobases) suggests that it may 
encode many, as yet unidentified, functions. One obvi¬ 
ous candidate would be a methyltransferase activity 
necessary for the generation of capped viral RNA in the 
cytoplasm of infected cells. Other functions may be 
related to the conserved “membrane protein," “cys- 
teine-rich,” and “X" domains which have been identi¬ 
fied in gene 1 of IBV and MHV (Gorbalenya etai, 1989; 
Lee et ai, 1991). 

In this paper we report the nucleotide sequence of 
the human coronavirus 229E gene 1 and the 5' region 
of the genome. This analysis completes the nucleotide 
sequence of HCV 229E. Furthermore, we provide evi¬ 
dence that, in common with IBV and MHV, HCV 229E 
ORFlb expression is mediated by (—1) ribosomal 
frame-shifting. The identification of structural and pu¬ 
tative functional motifs in the predicted HCV gene 1 
product and a comparison of their organization in the 
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51 GTGTCTACTTT TCTCAACT AAACGAAATTTTTGCTATGGCCGGCATCTTT 

M A G I F 

101 GATGCTGGAGTCGTAGTGTAATTGAAATTTCATTTGGGTTGCAACAGTTT 
D A G V V V * 

151 GGAAG CAAGTG CTGTGTGTCCTAGTC TAAG GGTTTCGTGTTCCGT CAC GA 

201 GATTCCATTCTACAAACGCCTTACTCGAGGTTCCGTCTCGTGTTTGTGTG 

251 GAAGCAAAGTTCTGTCTTTGTGGAAACCAGTAACTGTTCCTAATGGCCTG 

MAC 

301 CAACCGT GT GACACT1GCCGTAGCAAGTGATTCTGAAATTTCTGCAAAT G 
NRVTLAVASDSBISANG 



A-tailed T-tailed 

Fig. 2. Consensus sequence of cDNA clones representing the 5' 
region of the HCV 229E genome. (A) The consensus sequence. The 
intergenic motif, UCUCAAC, is underlined. ORFla is initiated with 
AUG at position 293 and the conserved 5' "minicistron” is indicated. 
(B) Sequence analysis of the 5' RACE products. The sequence of the 
A-tailed and T-tailed 5' RACE products derived from the HCV leader 
RNA were determined as described in Methods. The 5’ terminal A 
nucleotide is indicated. 


gene 1 proteins of HCV 229E, IBV, and MHV is also 
presented. 


MATERIALS AND METHODS 
Virus and cells 

The HCV 229E isolate used in these studies, the 
methods of virus propagation in C16 cells, and the iso¬ 
lation of cytoplasmic, poly(A)-containing RNA from 
HCV 229E-infected cells have been described (Raabe 
eta!., 1990). 

cDNA cloning 

cDNA synthesis was done by the method of Gubler 
and Hoffman (1983) using random hexanucleotides or 
the HCV 229E S gene specific oligonucleotide 1 
(Raabe et a!., 1990) as reverse transcription primers. 
The synthesized double-stranded cDNA was size-frac¬ 
tionated on a Sephacryl S-100Q column, cloned into 
pBluescript II KS + and transformed into competent 
Escherichia coti TG-1 cells. Recombinant clones were 
screened by colony hybridization with HCV 229E-spe- 
cific, 32 P-labeled oligonucleotides or HCV 229E-spe¬ 


cific cDNAs. Standard recombinant DNA procedures 
were done as described by Sambrook etal. (1989) and 
colony hybridizations were done as described by 
Woods (1984). 

PCR 

PCR was done using a GeneAmp/RNA PCR Kit ac¬ 
cording to the manufacturers procedures (Perkin- 
Elmer Cetus, Uberlingen, Germany). The biotinylated 
oligonucleotide 2 was used as upstream primer and 
oligonucleotide 3 was used as downstream and re¬ 
verse transcription primer. The resulting cDNA strands 
were separated using streptavidin-coupled magnetic 
beads, according to the manufacturers protocol 
(Dynal, Hamburg, Germany) and the nucleotide se¬ 
quence of both strands was determined. 




Fig. 3. Dot matrix comparisons of the predicted amino acid se¬ 
quences of the ORFla proteins of HCV229E, IBVand MHV. Compar¬ 
isons of the HCV and IBV proteins (upper panel) and the HCV and 
MHV proteins (lower panel) were generated using the GCG program 
COMPARE (window, 100; stringency, 30; default comparison table) 
and displayed with the program DOTPLOT. 
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Fig. 4. Putative functional domains of the HCV 229E ORF1 a translation product. The amino acid sequences of ORF1 a of HCV 229E, MHV, and 
IBV were aligned using the UWGCG program PileUp {default settings) and the structgappep.cmp (A) or pam250.cmp (B and C) comparison 
tables. (A) The papain-like protease motifs; (B) the 3C-like protease motif: (C) The growth factor/receptor-like motif. In (A) and (B) the catalytic 
residues proposed by Lee era/. (1991) are shown in bold type and marked with an asterisk. In {C) the putative disulphide bond residues proposed 
by Gorbalenya et a!. (1989) are similarly highlighted. The numbering of the aligned sequences is for reference only. 


DNA sequencing 

Sequencing was done on single-strand and double¬ 
strand templates using the chain termination method 
and Ml3, T7, T3, and HCV 229E~specific sequencing 
primers. To generate cDNA sequencing templates, 
overlapping deletions were introduced by unidirec¬ 
tional exonuclease III digestion (Henikoff, 1984). Both 
strands of all cDNAs and the PCR product were se¬ 
quenced. Sequence data was assembled by the pro¬ 
gram of Staden (1982) and analysed by the programs 
of the Genetics Computer Group, Inc. (Devereux et a!., 
1984). 

5' RACE 

Sequences at the 5' end of the HCV 229E leader 
RNA were determined by a "rapid amplification of 
cDNA ends" method (Frohmann et a!., 1988). A 32 P-la- 
beled oligonucleotide, 4, complementary to a region of 
the HCV 229E leader RNA (Schreiber et at., 1989), was 
used as primer for the reverse transcription of cyto¬ 
plasmic, poly(A)-containing RNA from HCV-infected 
cells. Reverse transcription was done with Superscript 


RNase H _ reverse transcriptase (Gibco, Eggenstein, 
Germany) using the manufacturers protocol. The larg¬ 
est product was purified by gel electrophoresis and 
tailed with dATP or dTTP using terminal transferase. 
The tailed cDNAs were then amplified in separate 3 
primer PCRs (A-tailed product; Oligonucleotides 4 and 
5 and biotinylated oligonucleotide 2; T-tailed product: 
Oligonucleotides 4 and 6 and biotinylated oligonucleo¬ 
tide 2). The amplifications were done using AmpliTaq 
DNA polymerase (Perkin-Elmer Cetus) by heating the 
reaction to 94°, followed by 3 cycles of denaturation 
(94°, 1 min), annealing (45°, 1 min), and extension 
(72°, 5 sec), 30 cycles of denaturation (94°, 1 min), 
annealing (51 °, 1 min) and extension (72°, 5 sec) and a 
final extension step of 72° for 10 min. The cDNA 
strands were separated using streptavidin coupled 
magnetic beads and the biotinylated strand was se¬ 
quenced using primer 4. 

Oligonucleotides 

Oligonucleotides were synthesized using phos- 
phoamidite chemistry on a Cyclone DNA synthesizer 
and purified by gel electrophoresis. The 5' biotinylated 
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Fig. 5. Analysis of HCV RNA-mediated ribosomal frame-shifting. (A) The consensus sequence of cDNA clones in the region of the ORFla/ 
ORFIb overlap. The ends of the ORFla and ORFlb sequences are indicated. The putative slippage site, TTTAAAC, is shown in bold type and 
the complementary sequences which we propose to form the SI and S2 stems are overlined. (B) A proposed model of the HCV 229E 
pseudoknot structure at the ORFla-ORFIb junction. (C) The structure of plasmids pFS and pAFS. The DNA structure of pFS and pAFS is 
schematically shown together with the position of the HCV 229E ORF f a/ORF 1 b overlap. The size of the SP6 run off transcription products and 
the translation products predicted in the event of ORF 1 a termination or (-1) ribosomal frame-shifting are shown. (D) In vitro translation products 
of pFS and pAFS mRNA. Lane M, molecular weight markers (CFA626, Amersham Buchler, Braunschweig, Germany); lane 1, no RNA; lane 2, 
pAFS/SamHI RNA; lane 3, pFS/Af/ll RNA; lane 4, pFS/SsfEII RNA. 


oligonucleotide was purchased (MWG-Biotech, Ebers- 
berg, Germany). The oligonucleotides used for cDNA 
synthesis, PCR, and 5' RACE were 

1. 5'-CAT CTA CAA CAG ATG AGG-3' 

2. 5'-BIOTIN GCC TAT GAA AGT GCT GTT GTT 
AAT GG-3' 

3. 5'-TTA GAT TTA AG A AC A GCC TGT GAC GC-3' 

4. 5'-GTA GAC ACA AAG TCT AAA AAG C-3' 

5. 5'-GCC TAT GAA AGT GCT GTT GTT AAT GOTH¬ 
S' 

6. 5'-GCC TAT GAA AGT GCT GTT GTT AAT 
GGA 18 -3'. 

Construction of plasmids pFS and pAFS 

A 1264 base pair Nde\-Hpa\ fragment of clone 
T16D8 {corresponding to bases 12,293-13,557 in the 
HCV genome, see Fig. 1), or a 427 base pair A/c/el— 
Sau96l fragment of clone T16D8 (corresponding to 
bases 12,293-12,720) was treated with the Klenow 
fragment of DNA polymerase and exchanged with the 
small {230 base pair) EcoRV fragment of pSP65-GUS 
(Prufer et at., 1992). The clones containing the HCV 
DNA fragments in the correct orientation (pFS and 
pAFS, respectively) were identified by restriction en¬ 
zyme analysis and the constructions were verified by 
sequencing. 


In vitro transcription and translation 

Plasmid DNA was linearized with/V/ll or&sfEII (pFS) 
or BamV \I (pAFS) and transcribed with SP6 RNA poly¬ 
merase as described by Melton et at. (1984). The in 
vitro synthesized, capped RNAs were translated in a 
rabbit reticulocyte lysate in the presence of [ 35 S]- 
methionine and the products were analyzed on 10% 
polyacrylamide-SDS gels as described previously (Sid- 
dell, 1983). The radioactivity incorporated into the 
translation products was determined using a Phos- 
phorlmager Model 400E (Molecular Dynamics, Sunny¬ 
vale, USA). 


RESULTS 


Molecular cloning and nucleotide sequence of the 
HCV 229E gene 1 

The HCV 229E gene 1 was cloned in a series of 21 
cDNA clones. One region, corresponding to bases 
5781-5934 in the genomic RNA, was not represented 
in the three cDNA libraries screened and it was se¬ 
quenced by PCR techniques. More than 85% of the 
sequence presented was determined on two or more 
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Fig. 5 —Continued 


cDNA clones. The sequence encompassing the 
"frameshift region" (see below) was obtained on five 
independent cDNA clones from two cDNA libraries 
(Fig. 1). The length of the consensus cDNA sequence 
is 20,774 nucleotides, extending from base 1 at the 5' 
end of the genome to base 20,774, which corresponds 
to the 68th codon of the HCV S protein gene. Within 
this sequence are two large ORFs. ORFla is initiated 
with an AUG at base 293 and contains 4086 codons. 
ORFIb, which is initiated at base 12,508 with CAG 
contains 2687 codons and overlaps ORFla by 43 
bases in the (—1) reading frame. The nucleotide se¬ 
quence of the HCV 229E gene 1 has been deposited 
with the EMBLVGenBank/DDBJ nucleotide sequence 
Data Libraries and is available under accession num¬ 
ber X69721. 


5' Region of the genome 

The consensus sequence of cDNAs which encom¬ 
pass the region of the HCV 229E genome preceding 
the ORFla initiation codon was deduced from the se¬ 
quence of cDNA clones J12E6 and T35D5 together 
with 5' RACE clones produced from poly(A)-containing 
RNA (Fig. 2). The validity of the HCV 229E sequence, 
therefore, depends upon the assumption that the 
mRNA leader sequence is equivalent to the 5' end of 
the genomic RNA. This equivalence has been demon¬ 
strated for MHV (Shieh el a!., 1987). 

The genomic sequence begins with an adenine. At 
position 62-68 the sequence UCUCAAC is found. This 
or a closely related sequence is located adjacent to the 
5' end of all HCV 229E genes and represents the so- 
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Fig. 6. Dot matrix comparisons of the predicted amino acid se¬ 
quences of the ORF1 b proteins of HCV 229E, IBV, and MHV. Com¬ 
parisons of the HCV and IBV proteins (upper panel) and the HCV and 
MHV proteins (lower panel) were generated using the GCG program 
COMPARE (window, 100; stringency, 30; default comparison table) 
and displayed with the program DOTPLOT. 


called “intergenic consensus” sequence which is be¬ 
lieved to have a pivotal role in the discontinuous tran¬ 
scription of coronavirus mRNAs (Joo and Makino, 
1992). At position 86 a short ORF of 12 codons is initi¬ 
ated with AUG. This ORF would be unremarkable ex¬ 
cept that similar ORFs are conserved in the genomes 
of IBV and MHV (Boursnell et a!., 1987; Soe et a!., 
1987). It might be speculated that this ORF has a role, 
for example, in the regulation of the initiation of protein 
synthesis from genomic RNA. 

ORFIa 

Structural features. ORFIa has the potential to en¬ 
code a protein of 4085 amino acids with a predicted 


molecular weight of 454,200. The hydrophilicity profile 
of the predicted protein (data not shown) shows sev¬ 
eral regions in which hydrophobic residues predomi¬ 
nate. Particularly striking are the regions encompassed 
by amino acids 2720-2890 and 3270-3510. These 
regions represent potential membrane spanning do¬ 
mains. 

A comparison of the predicted HCV ORFIa protein 
with the corresponding proteins of IBV and MHV using 
the GCG program GAP (default settings) indicates 
51.2% similarity (27.3% identity) between the HCV and 
IBV proteins and 51.4% similarity (28.0% identity) for 
the HCV and MHV proteins after optimal alignment. A 
more detailed analysis using the COMPARE program 
(window, 100; stringency, 30; default comparison ta¬ 
ble) illustrates that in all three proteins the regions of 
greatest similarity are located in the carboxy-terminal 
half of the molecule (Fig. 3). 

Putative functional domains. The predicted ORFIa 
proteins of IBV and MHV have been analyzed in detail 
and motifs which are thought to represent domains 
with specific functions have been identified (Gorba- 
leny aetal., 1989; Lee etai, 1991; Bredenbeek et a!., 
1990). This analysis can be extended by comparison of 
the predicted HCV 229E ORFIa protein with those of 
IBV and MHV and the results of this analysis are shown 
in Fig. 4. 

The first domains which can be recognized in the 
HCV protein, display motifs indicative of papain-like 
proteases. In HCV 229E, as in MHV, two such motifs 
are found, located between amino acids 1041-1234 
and 1688-1886 (Fig. 4A). The most characteristic fea¬ 
ture of these motifs is the conserved putative catalytic 
Cys and His residues located at positions 1054 and 
1701 and 1305 and 1663, respectively, in the HCV 
protein. 

The second motif identified in the predicted HCV 
protein is related to the picornavirus 3C-like protease 
domain. This motif is located between amino acids 
2965-3265 (Fig. 4B). It should be noted that the fea¬ 
tures which distinguish the coronavirus 3C-like motif 
from other 3C-like protease motifs (a Gly -*> Tyr substi¬ 
tution in the vicinity of the proposed catalytic Cys resi¬ 
due and the absence of a conserved Asp/Glu as a third 
catalytic site residue) are maintained in the predicted 
HCV protein. 

The third HCV ORFIa motif which has been identi¬ 
fied is a cysteine-rich domain located between amino 
acids 3933-4069 (Fig. 4C). This motif has been recog¬ 
nized in the MHV and IBV genomes and is related to 
motifs found in growth factors and their receptors. 

The frame-shifting region 

By analogy to IBV and MHV it seems likely that ex¬ 
pression of the HCV ORF 1 b is mediated by a (— 1) ribo- 
somal frame-shifting event during translation of the ge¬ 
nomic RNA in the region of the ORFIa/ORFIb overlap. 
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Fig. 7. Putative functional domains of the HCV 229E ORF1 b translation product. The amino acid sequences of ORFl b of HCV 229E, MHV, and 
IBV were aligned using the UWGCG program PileUp (default settings) and the pam250.cmp comparison table. (A) The RNA polymerase domain, 
(B) the metal binding domain, (C) the helicase domain. In (A) the characteristic GDD (SDD) motif is highlighted and the polymerase domains I to 
VIII (Koonin, 1991} are shaded. In figure (B) the conserved Cys/His residues which may be involved in metal ion ligation (Lee era/., 1991) are 
shown in bold type and marked with an asterisk. In (C) the characteristic "A" and "B" sites (Gorbalenya and Koonin, 1989) are shown and 
conserved residues are similarly highlighted. The numbering of the aligned sequence is for reference only. 


The consensus sequence of the cDNA clones which 
encompass this region is shown in Fig. 5A. The se¬ 
quence UUUAAAC which is found at position 12,514- 
12,520 in the HCV sequence, 27 bases upstream of 
the ORFIa termination codon, is identical to the slip¬ 
page site of IBV (Brierley eta!., 1992) and the putative 
slippage site of MHV-JHM {Lee eta!., 1991) and MHV- 
A59 (Bredenbeek et ai, 1990). The HCV sequences 3' 


of this site can also folded into a tertiary RNA structure, 
the pseudoknot, which is the second element required 
for efficient frame shifting (Brierley etai., 1989). Figure 
5B illustrates a pseudoknot structure in which the stem 
SI is formed by base pairing of nucleotides 12,528- 
12,537 and 12,546-12,555 and the stem S2 is formed 
by base pairing of nucleotides 12,541-12,545 and 
12,723-12,727 (overlined in Fig. 5A). This structure 
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Fig. 8. Position of the putative functional domains in the gene 1 translation products of HCV 229E, MHV, and IBV. The figure is drawn to scale 
although the boundaries of the motifs cannot be defined precisely. PLP, papain-like protease; 3CL, 3C-like protease; GFL, growth factor/recep¬ 
tor-like; POL, polymerase module; MBD, metal binding domain; HEL, helicase (NTP-binding) domain. The arrow indicates the position of the 
ORF1 a/ORFI b junction. 


would necessitate an LI loop of 3 bases and an L2 
loop of 168 bases, values that exceed the minimum 
required length (Brierley et a!., 1991). Clearly, further 
experimental evidence will be needed to confirm or re¬ 
fute this model. 

To confirm that the HCV 229E ORF1 a/ORFI b over¬ 
lap region is able to mediate (-1) ribosomal frame- 
shifting we constructed two plasmids for the in vitro 
transcription of mRNA (Fig. 5C). Plasmid pFS contains 
the putative frame-shifting region (nucleotides 
12,293-13,557) flanked by and in frame with DNA en¬ 
coding the amino- and carboxy-terminal regions of the 
E. coli ^-glucuronidase (GUS) protein. Plasmid pAFS 
was identical, except that the HCV 229E sequences 
extended only from position 12,293 to 12,720, i.e., did 
not include the pentanucleotide sequence CGAGC 
which is complementary to the sequence GCUCG 
which we propose to be in the S2 stem of the pseudo¬ 
knot structure. The plasmids were linearized with AfiW 
orfisfEII (pFS) orSamHI (pAFS) and capped SP6 run-off 
transcripts were synthesized in vitro. The transcripts 
were translated in rabbit reticulocyte lysate and the 
results are shown in Fig. 5D. 

The pFS/Af/ll transcript directed the synthesis of 
34,000 and 49,000 molecular weight proteins. The 
pFS/fisfEII transcript directed the synthesis of 34,000 
and 66,000 molecular weight proteins and the pAFS/ 
BamVW transcript directed the synthesis of a 34,000 
molecular weight protein. By reference to Fig. 50, it 
can be seen that these are the results expected if the 
HCV sequence of pFS mediates (—1) ribosomal frame- 
shifting and the proposed pentanucleotide base-pair¬ 
ing interaction is necessary to produce a functional 
frame-shifting element. A quantitative Phosphorlmager 
analysis of the data shown in Fig. 5D indicates that in 
the pFS transcripts frame-shifting occurs at a fre¬ 
quency between 18 and 30%. 

Careful analysis of the translation products directed 
by the pAFS/BamHI transcript reveals a protein of 


73,000 molecular weight, which would be expected if 
(-1) frame-shifting has occurred. The amount of pro¬ 
tein synthesized represents a frame-shifting frequency 
of < 1%. We believe this could be explained by a less 
stable S2 stem formed between the GCUCG pentanu¬ 
cleotide at position 12,541-12,545 and the sequence 
CGUGC located at nucleotides 12,586-12,590. Fur¬ 
ther studies are required to confirm this interpretation. 
We have also noted that in all translations the N-GUS- 
HCV ORFla product, predicted to have a molecular 
weight of 30.700, has a slower electrophoretic mobility 
than expected. At the moment we have no explanation 
for this anomaly. 

ORFlb 

Structural features. ORFlb has the potential to en¬ 
code a protein of 2686 amino acids with a molecular 
weight of 300,300. If, however, (-1) ribosomal frame- 
shifting takes place at the slippage site in ORFla (see 
above), the ORF1 a/ORFI b fusion protein has a poten¬ 
tial molecular weight of 754,200. The hydrophilicity 
profile of the predicted ORFlb translation product 
(data not shown) shows both hydrophilic and hydro- 
phobic regions but none are indicative of extensive 
membrane spanning regions. A comparison of the pre¬ 
dicted HCV ORF1 b protein with the corresponding pro¬ 
teins of IBV and MHV using the GAP program (default 
settings) indicates 69.7% similarity (53.8% identity) for 
the HCV and IBV proteins and 70.5% similarity (54.2% 
identity) for the HCV and MHV proteins after optimal 
alignment. This high degree of similarity is essentially 
uniform over the entire length of all three proteins, as is 
evident in the dot matrix comparisons shown in Fig. 6 
(program COMPARE, window, 100; stringency, 30; 
default comparison table). 

Putative functional domains. As with ORFla, the 
HCV ORFlb gene product can be compared with the 
ORFlb proteins of MHV and IBV and putative tunc- 
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tional motifs can be identified. The first such motif is 
the RNA polymerase element located between amino 
acids 534 and 836 {Fig. 7A). The HCV motif aligns well 
with the MHV and IBV motifs and can be divided into 
eight distinct regions recognized by Koonin (1991) as 
characteristic of a wide variety of putative RNA poly¬ 
merases. The alteration of the RNA polymerase “core” 
sequence Glu-Asp-Asp to Ser-Asp-Asp is maintained 
in the HCV ORFlb protein. 

The second motif recognized in the HCV protein is 
related to the “finger” domain characteristic of numer¬ 
ous DNA and RNA binding proteins. This motif located 
between amino acids 924-999 in the HCV protein, 
consists of a defined sequence of Cys and His resi¬ 
dues. As for the homologous region of the MHV pro¬ 
tein, not all of the residues which were originally pro¬ 
posed to be involved in the IBV ORFlb metal binding 
domain are conserved in the HCV sequence (Fig. 7B) 
(Gorbalenya etal., 1989). 

The third motif identified in the predicted HCV pro¬ 
tein is the purine NTP binding sequence pattern which 
is thought to be a feature of duplex unwinding (i.e., 
helicase) activities (Gorbalenya and Koonin, 1989). 
This motif is located in the HCV ORF1 b protein at posi¬ 
tion 1202-1330 and is highly conserved in comparison 
to the same motif in the MHV and IBV proteins 
(Fig. 7C). 

In addition to the sequence similarities in the RNA 
polymerase genes of HCV, IBV and MHV, recent analy¬ 
sis of arterivirus and torovirus RNA polymerase genes 
(Snijder et at., 1990; Kuo etal., 1991; Den Boon etal., 
1991) have revealed evolutionary links between arteri-, 
toro-, and coronaviruses. The polymerase core motif, 
the finger domain and the NTP binding sequence pat¬ 
tern described above are found, for example, in the 
polymerase genes of equine arteritis virus and Berne 
virus. Also, a conserved domain located at the car- 
boxy-terminus of coronavirus, arterivirus and torovirus 
ORFlb proteins has been recognized, but a function 
has not yet been proposed (Snijder et a!., 1990). 

DISCUSSION 

Coronaviruses have been traditionally divided into 
four antigenic groups (Holmes, 1990). HCV 229E be¬ 
longs to group 1, together with transmissible gastroen¬ 
teritis virus (TGEV), canine coronavirus (CCV), feline in¬ 
fectious peritonitis virus (FiPV) and feline enteric coro¬ 
navirus (FECV) (see, however, Sanchez et a!., 1990). 
Thus the nucleotide sequence of a group 1 
(HCV229E), a group 2 (MHV), and a group 3 (IBV) cor¬ 
onavirus is now available. HCV 229E is also the first 
human coronavirus to be entirely sequenced and we 
hope that many questions concerning the biology and 
pathogenesis of these viruses can now be investigated 
more easily. 

The HCV 229E gene 1 is comparable in size and 
organization to gene 1 of IBV and MHV. The predicted 


gene product displays a number of structural features 
and putative functional domains (Fig. 8). These include 
functions related to RNA synthesis (POL, MBD, and 
HEL) in the ORFlb gene product and proteolytic activi¬ 
ties (PLP and 3CL) in the ORFIa gene product. The 
experiments of ourselves and others (Brierley et at., 
1987; Lee etal., 1991; Bredenbeek et a!., 1990) show 
that expression of these functions can be regulated via 
the mechanism of ribosomal frame-shifting. At the 
same time, we predict that in vivo they are also likely to 
be coordinated by the activation or inactivation of one 
set of functions (RNA synthesis) by the other (pro¬ 
teases). Clearly, it will be a difficult task to unravel 
these complex interactions. However, the availability 
of a complete set of cDNAs encompassing the HCV 
polymerase gene serves as a useful starting point. 

First, the cDNAs can be used to generate a collec¬ 
tion of immunological reagents which facilitate the 
analysis of polymerase gene expression in HCV-in- 
fected cells. Without such reagents it will be very diffi¬ 
cult to identify and characterize the low amounts of 
gene 1 products which can be expected. In this re¬ 
spect, an important step forward has also been the 
recent identification of the cellular receptor for HCV 
229E as aminopeptidase N (Yeager et at., 1992). This 
finding may allow the development of better cell culture 
systems for the biochemical analysis of HCV replica¬ 
tion. 

Second, the HCV polymerase cDNAs together with 
recently developed vaccinia virus vectors (Merchlinsky 
and Moss, 1992) should make it possible to (over) ex¬ 
press the HCV 229E polymerase gene in eucaryotic 
cells. This will also facilitate the analysis of polymerase 
gene expression and more importantly provide an op¬ 
portunity to investigate the function of polymerase 
gene products via reverse genetics. Experiments to¬ 
ward these goals are in progress. 
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